You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Revert "mm, mempool: only set __GFP_NOMEMALLOC if there are free elements"
This reverts commit f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC
if there are free elements").
There has been a report about OOM killer invoked when swapping out to a
dm-crypt device. The primary reason seems to be that the swapout out IO
managed to completely deplete memory reserves. Ondrej was able to
bisect and explained the issue by pointing to f9054c7 ("mm,
mempool: only set __GFP_NOMEMALLOC if there are free elements").
The reason is that the swapout path is not throttled properly because
the md-raid layer needs to allocate from the generic_make_request path
which means it allocates from the PF_MEMALLOC context. dm layer uses
mempool_alloc in order to guarantee a forward progress which used to
inhibit access to memory reserves when using page allocator. This has
changed by f9054c7 ("mm, mempool: only set __GFP_NOMEMALLOC if
there are free elements") which has dropped the __GFP_NOMEMALLOC
protection when the memory pool is depleted.
If we are running out of memory and the only way forward to free memory
is to perform swapout we just keep consuming memory reserves rather than
throttling the mempool allocations and allowing the pending IO to
complete up to a moment when the memory is depleted completely and there
is no way forward but invoking the OOM killer. This is less than
optimal.
The original intention of f9054c7 was to help with the OOM
situations where the oom victim depends on mempool allocation to make a
forward progress. David has mentioned the following backtrace:
schedule
schedule_timeout
io_schedule_timeout
mempool_alloc
__split_and_process_bio
dm_request
generic_make_request
submit_bio
mpage_readpages
ext4_readpages
__do_page_cache_readahead
ra_submit
filemap_fault
handle_mm_fault
__do_page_fault
do_page_fault
page_fault
We do not know more about why the mempool is depleted without being
replenished in time, though. In any case the dm layer shouldn't depend
on any allocations outside of the dedicated pools so a forward progress
should be guaranteed. If this is not the case then the dm should be
fixed rather than papering over the problem and postponing it to later
by accessing more memory reserves.
mempools are a mechanism to maintain dedicated memory reserves to
guaratee forward progress. Allowing them an unbounded access to the
page allocator memory reserves is going against the whole purpose of
this mechanism.
Bisected by Ondrej Kozina.
[[email protected]: coding-style fixes]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Michal Hocko <[email protected]>
Reported-by: Ondrej Kozina <[email protected]>
Reviewed-by: Johannes Weiner <[email protected]>
Acked-by: NeilBrown <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Mikulas Patocka <[email protected]>
Cc: Ondrej Kozina <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
0 commit comments