Skip to content

Commit 0aaa29a

Browse files
gormanmtorvalds
authored andcommitted
mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand
High-order watermark checking exists for two reasons -- kswapd high-order awareness and protection for high-order atomic requests. Historically the kernel depended on MIGRATE_RESERVE to preserve min_free_kbytes as high-order free pages for as long as possible. This patch introduces MIGRATE_HIGHATOMIC that reserves pageblocks for high-order atomic allocations on demand and avoids using those blocks for order-0 allocations. This is more flexible and reliable than MIGRATE_RESERVE was. A MIGRATE_HIGHORDER pageblock is created when an atomic high-order allocation request steals a pageblock but limits the total number to 1% of the zone. Callers that speculatively abuse atomic allocations for long-lived high-order allocations to access the reserve will quickly fail. Note that SLUB is currently not such an abuser as it reclaims at least once. It is possible that the pageblock stolen has few suitable high-order pages and will need to steal again in the near future but there would need to be strong justification to search all pageblocks for an ideal candidate. The pageblocks are unreserved if an allocation fails after a direct reclaim attempt. The watermark checks account for the reserved pageblocks when the allocation request is not a high-order atomic allocation. The reserved pageblocks can not be used for order-0 allocations. This may allow temporary wastage until a failed reclaim reassigns the pageblock. This is deliberate as the intent of the reservation is to satisfy a limited number of atomic high-order short-lived requests if the system requires them. The stutter benchmark was used to evaluate this but while it was running there was a systemtap script that randomly allocated between 1 high-order page and 12.5% of memory's worth of order-3 pages using GFP_ATOMIC. This is much larger than the potential reserve and it does not attempt to be realistic. It is intended to stress random high-order allocations from an unknown source, show that there is a reduction in failures without introducing an anomaly where atomic allocations are more reliable than regular allocations. The amount of memory reserved varied throughout the workload as reserves were created and reclaimed under memory pressure. The allocation failures once the workload warmed up were as follows; 4.2-rc5-vanilla 70% 4.2-rc5-atomic-reserve 56% The failure rate was also measured while building multiple kernels. The failure rate was 14% but is 6% with this patch applied. Overall, this is a small reduction but the reserves are small relative to the number of allocation requests. In early versions of the patch, the failure rate reduced by a much larger amount but that required much larger reserves and perversely made atomic allocations seem more reliable than regular allocations. [[email protected]: fix redundant check and a memory leak] Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Vitaly Wool <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: yalin wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 974a786 commit 0aaa29a

File tree

3 files changed

+135
-10
lines changed

3 files changed

+135
-10
lines changed

include/linux/mmzone.h

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ enum {
3939
MIGRATE_UNMOVABLE,
4040
MIGRATE_MOVABLE,
4141
MIGRATE_RECLAIMABLE,
42+
MIGRATE_PCPTYPES, /* the number of types on the pcp lists */
43+
MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES,
4244
#ifdef CONFIG_CMA
4345
/*
4446
* MIGRATE_CMA migration type is designed to mimic the way
@@ -61,8 +63,6 @@ enum {
6163
MIGRATE_TYPES
6264
};
6365

64-
#define MIGRATE_PCPTYPES (MIGRATE_RECLAIMABLE+1)
65-
6666
#ifdef CONFIG_CMA
6767
# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA)
6868
#else
@@ -334,6 +334,8 @@ struct zone {
334334
/* zone watermarks, access with *_wmark_pages(zone) macros */
335335
unsigned long watermark[NR_WMARK];
336336

337+
unsigned long nr_reserved_highatomic;
338+
337339
/*
338340
* We don't know if the memory that we're going to allocate will be freeable
339341
* or/and it will be released eventually, so to avoid totally wasting several

mm/page_alloc.c

Lines changed: 130 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1615,6 +1615,101 @@ int find_suitable_fallback(struct free_area *area, unsigned int order,
16151615
return -1;
16161616
}
16171617

1618+
/*
1619+
* Reserve a pageblock for exclusive use of high-order atomic allocations if
1620+
* there are no empty page blocks that contain a page with a suitable order
1621+
*/
1622+
static void reserve_highatomic_pageblock(struct page *page, struct zone *zone,
1623+
unsigned int alloc_order)
1624+
{
1625+
int mt;
1626+
unsigned long max_managed, flags;
1627+
1628+
/*
1629+
* Limit the number reserved to 1 pageblock or roughly 1% of a zone.
1630+
* Check is race-prone but harmless.
1631+
*/
1632+
max_managed = (zone->managed_pages / 100) + pageblock_nr_pages;
1633+
if (zone->nr_reserved_highatomic >= max_managed)
1634+
return;
1635+
1636+
spin_lock_irqsave(&zone->lock, flags);
1637+
1638+
/* Recheck the nr_reserved_highatomic limit under the lock */
1639+
if (zone->nr_reserved_highatomic >= max_managed)
1640+
goto out_unlock;
1641+
1642+
/* Yoink! */
1643+
mt = get_pageblock_migratetype(page);
1644+
if (mt != MIGRATE_HIGHATOMIC &&
1645+
!is_migrate_isolate(mt) && !is_migrate_cma(mt)) {
1646+
zone->nr_reserved_highatomic += pageblock_nr_pages;
1647+
set_pageblock_migratetype(page, MIGRATE_HIGHATOMIC);
1648+
move_freepages_block(zone, page, MIGRATE_HIGHATOMIC);
1649+
}
1650+
1651+
out_unlock:
1652+
spin_unlock_irqrestore(&zone->lock, flags);
1653+
}
1654+
1655+
/*
1656+
* Used when an allocation is about to fail under memory pressure. This
1657+
* potentially hurts the reliability of high-order allocations when under
1658+
* intense memory pressure but failed atomic allocations should be easier
1659+
* to recover from than an OOM.
1660+
*/
1661+
static void unreserve_highatomic_pageblock(const struct alloc_context *ac)
1662+
{
1663+
struct zonelist *zonelist = ac->zonelist;
1664+
unsigned long flags;
1665+
struct zoneref *z;
1666+
struct zone *zone;
1667+
struct page *page;
1668+
int order;
1669+
1670+
for_each_zone_zonelist_nodemask(zone, z, zonelist, ac->high_zoneidx,
1671+
ac->nodemask) {
1672+
/* Preserve at least one pageblock */
1673+
if (zone->nr_reserved_highatomic <= pageblock_nr_pages)
1674+
continue;
1675+
1676+
spin_lock_irqsave(&zone->lock, flags);
1677+
for (order = 0; order < MAX_ORDER; order++) {
1678+
struct free_area *area = &(zone->free_area[order]);
1679+
1680+
if (list_empty(&area->free_list[MIGRATE_HIGHATOMIC]))
1681+
continue;
1682+
1683+
page = list_entry(area->free_list[MIGRATE_HIGHATOMIC].next,
1684+
struct page, lru);
1685+
1686+
/*
1687+
* It should never happen but changes to locking could
1688+
* inadvertently allow a per-cpu drain to add pages
1689+
* to MIGRATE_HIGHATOMIC while unreserving so be safe
1690+
* and watch for underflows.
1691+
*/
1692+
zone->nr_reserved_highatomic -= min(pageblock_nr_pages,
1693+
zone->nr_reserved_highatomic);
1694+
1695+
/*
1696+
* Convert to ac->migratetype and avoid the normal
1697+
* pageblock stealing heuristics. Minimally, the caller
1698+
* is doing the work and needs the pages. More
1699+
* importantly, if the block was always converted to
1700+
* MIGRATE_UNMOVABLE or another type then the number
1701+
* of pageblocks that cannot be completely freed
1702+
* may increase.
1703+
*/
1704+
set_pageblock_migratetype(page, ac->migratetype);
1705+
move_freepages_block(zone, page, ac->migratetype);
1706+
spin_unlock_irqrestore(&zone->lock, flags);
1707+
return;
1708+
}
1709+
spin_unlock_irqrestore(&zone->lock, flags);
1710+
}
1711+
}
1712+
16181713
/* Remove an element from the buddy allocator from the fallback list */
16191714
static inline struct page *
16201715
__rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
@@ -1670,7 +1765,7 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, int start_migratetype)
16701765
* Call me with the zone->lock already held.
16711766
*/
16721767
static struct page *__rmqueue(struct zone *zone, unsigned int order,
1673-
int migratetype)
1768+
int migratetype, gfp_t gfp_flags)
16741769
{
16751770
struct page *page;
16761771

@@ -1700,7 +1795,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
17001795

17011796
spin_lock(&zone->lock);
17021797
for (i = 0; i < count; ++i) {
1703-
struct page *page = __rmqueue(zone, order, migratetype);
1798+
struct page *page = __rmqueue(zone, order, migratetype, 0);
17041799
if (unlikely(page == NULL))
17051800
break;
17061801

@@ -2072,7 +2167,7 @@ int split_free_page(struct page *page)
20722167
static inline
20732168
struct page *buffered_rmqueue(struct zone *preferred_zone,
20742169
struct zone *zone, unsigned int order,
2075-
gfp_t gfp_flags, int migratetype)
2170+
gfp_t gfp_flags, int alloc_flags, int migratetype)
20762171
{
20772172
unsigned long flags;
20782173
struct page *page;
@@ -2115,7 +2210,15 @@ struct page *buffered_rmqueue(struct zone *preferred_zone,
21152210
WARN_ON_ONCE(order > 1);
21162211
}
21172212
spin_lock_irqsave(&zone->lock, flags);
2118-
page = __rmqueue(zone, order, migratetype);
2213+
2214+
page = NULL;
2215+
if (alloc_flags & ALLOC_HARDER) {
2216+
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
2217+
if (page)
2218+
trace_mm_page_alloc_zone_locked(page, order, migratetype);
2219+
}
2220+
if (!page)
2221+
page = __rmqueue(zone, order, migratetype, gfp_flags);
21192222
spin_unlock(&zone->lock);
21202223
if (!page)
21212224
goto failed;
@@ -2226,15 +2329,24 @@ static bool __zone_watermark_ok(struct zone *z, unsigned int order,
22262329
unsigned long mark, int classzone_idx, int alloc_flags,
22272330
long free_pages)
22282331
{
2229-
/* free_pages may go negative - that's OK */
22302332
long min = mark;
22312333
int o;
22322334
long free_cma = 0;
22332335

2336+
/* free_pages may go negative - that's OK */
22342337
free_pages -= (1 << order) - 1;
2338+
22352339
if (alloc_flags & ALLOC_HIGH)
22362340
min -= min / 2;
2237-
if (alloc_flags & ALLOC_HARDER)
2341+
2342+
/*
2343+
* If the caller does not have rights to ALLOC_HARDER then subtract
2344+
* the high-atomic reserves. This will over-estimate the size of the
2345+
* atomic reserve but it avoids a search.
2346+
*/
2347+
if (likely(!(alloc_flags & ALLOC_HARDER)))
2348+
free_pages -= z->nr_reserved_highatomic;
2349+
else
22382350
min -= min / 4;
22392351

22402352
#ifdef CONFIG_CMA
@@ -2419,10 +2531,18 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int alloc_flags,
24192531

24202532
try_this_zone:
24212533
page = buffered_rmqueue(ac->preferred_zone, zone, order,
2422-
gfp_mask, ac->migratetype);
2534+
gfp_mask, alloc_flags, ac->migratetype);
24232535
if (page) {
24242536
if (prep_new_page(page, order, gfp_mask, alloc_flags))
24252537
goto try_this_zone;
2538+
2539+
/*
2540+
* If this is a high-order atomic allocation then check
2541+
* if the pageblock should be reserved for the future
2542+
*/
2543+
if (unlikely(order && (alloc_flags & ALLOC_HARDER)))
2544+
reserve_highatomic_pageblock(page, zone, order);
2545+
24262546
return page;
24272547
}
24282548
}
@@ -2695,9 +2815,11 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
26952815

26962816
/*
26972817
* If an allocation failed after direct reclaim, it could be because
2698-
* pages are pinned on the per-cpu lists. Drain them and try again
2818+
* pages are pinned on the per-cpu lists or in high alloc reserves.
2819+
* Shrink them them and try again
26992820
*/
27002821
if (!page && !drained) {
2822+
unreserve_highatomic_pageblock(ac);
27012823
drain_all_pages(NULL);
27022824
drained = true;
27032825
goto retry;

mm/vmstat.c

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -923,6 +923,7 @@ static char * const migratetype_names[MIGRATE_TYPES] = {
923923
"Unmovable",
924924
"Reclaimable",
925925
"Movable",
926+
"HighAtomic",
926927
#ifdef CONFIG_CMA
927928
"CMA",
928929
#endif

0 commit comments

Comments
 (0)