Skip to content

Commit b39d0ee

Browse files
rientjestorvalds
authored andcommitted
mm, page_alloc: avoid expensive reclaim when compaction may not succeed
Memory compaction has a couple significant drawbacks as the allocation order increases, specifically: - isolate_freepages() is responsible for finding free pages to use as migration targets and is implemented as a linear scan of memory starting at the end of a zone, - failing order-0 watermark checks in memory compaction does not account for how far below the watermarks the zone actually is: to enable migration, there must be *some* free memory available. Per the above, watermarks are not always suffficient if isolate_freepages() cannot find the free memory but it could require hundreds of MBs of reclaim to even reach this threshold (read: potentially very expensive reclaim with no indication compaction can be successful), and - if compaction at this order has failed recently so that it does not even run as a result of deferred compaction, looping through reclaim can often be pointless. For hugepage allocations, these are quite substantial drawbacks because these are very high order allocations (order-9 on x86) and falling back to doing reclaim can potentially be *very* expensive without any indication that compaction would even be successful. Reclaim itself is unlikely to free entire pageblocks and certainly no reliance should be put on it to do so in isolation (recall lumpy reclaim). This means we should avoid reclaim and simply fail hugepage allocation if compaction is deferred. It is also not helpful to thrash a zone by doing excessive reclaim if compaction may not be able to access that memory. If order-0 watermarks fail and the allocation order is sufficiently large, it is likely better to fail the allocation rather than thrashing the zone. Signed-off-by: David Rientjes <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Stefan Priebe - Profihost AG <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent 19deb76 commit b39d0ee

File tree

1 file changed

+22
-0
lines changed

1 file changed

+22
-0
lines changed

mm/page_alloc.c

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4458,6 +4458,28 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
44584458
if (page)
44594459
goto got_pg;
44604460

4461+
if (order >= pageblock_order && (gfp_mask & __GFP_IO)) {
4462+
/*
4463+
* If allocating entire pageblock(s) and compaction
4464+
* failed because all zones are below low watermarks
4465+
* or is prohibited because it recently failed at this
4466+
* order, fail immediately.
4467+
*
4468+
* Reclaim is
4469+
* - potentially very expensive because zones are far
4470+
* below their low watermarks or this is part of very
4471+
* bursty high order allocations,
4472+
* - not guaranteed to help because isolate_freepages()
4473+
* may not iterate over freed pages as part of its
4474+
* linear scan, and
4475+
* - unlikely to make entire pageblocks free on its
4476+
* own.
4477+
*/
4478+
if (compact_result == COMPACT_SKIPPED ||
4479+
compact_result == COMPACT_DEFERRED)
4480+
goto nopage;
4481+
}
4482+
44614483
/*
44624484
* Checks for costly allocations with __GFP_NORETRY, which
44634485
* includes THP page fault allocations

0 commit comments

Comments
 (0)