Skip to content

Commit ea57485

Browse files
tehcastertorvalds
authored andcommitted
mm, page_alloc: fix check for NULL preferred_zone
Patch series "fix premature OOM regression in 4.7+ due to cpuset races". This is v2 of my attempt to fix the recent report based on LTP cpuset stress test [1]. The intention is to go to stable 4.9 LTSS with this, as triggering repeated OOMs is not nice. That's why the patches try to be not too intrusive. Unfortunately why investigating I found that modifying the testcase to use per-VMA policies instead of per-task policies will bring the OOM's back, but that seems to be much older and harder to fix problem. I have posted a RFC [2] but I believe that fixing the recent regressions has a higher priority. Longer-term we might try to think how to fix the cpuset mess in a better and less error prone way. I was for example very surprised to learn, that cpuset updates change not only task->mems_allowed, but also nodemask of mempolicies. Until now I expected the parameter to alloc_pages_nodemask() to be stable. I wonder why do we then treat cpusets specially in get_page_from_freelist() and distinguish HARDWALL etc, when there's unconditional intersection between mempolicy and cpuset. I would expect the nodemask adjustment for saving overhead in g_p_f(), but that clearly doesn't happen in the current form. So we have both crazy complexity and overhead, AFAICS. [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com [2] https://lkml.kernel.org/r/[email protected] This patch (of 4): Since commit c33d6c0 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") we have a wrong check for NULL preferred_zone, which can theoretically happen due to concurrent cpuset modification. We check the zoneref pointer which is never NULL and we should check the zone pointer. Also document this in first_zones_zonelist() comment per Michal Hocko. Fixes: c33d6c0 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
1 parent ff7a28a commit ea57485

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

include/linux/mmzone.h

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -972,12 +972,16 @@ static __always_inline struct zoneref *next_zones_zonelist(struct zoneref *z,
972972
* @zonelist - The zonelist to search for a suitable zone
973973
* @highest_zoneidx - The zone index of the highest zone to return
974974
* @nodes - An optional nodemask to filter the zonelist with
975-
* @zone - The first suitable zone found is returned via this parameter
975+
* @return - Zoneref pointer for the first suitable zone found (see below)
976976
*
977977
* This function returns the first zone at or below a given zone index that is
978978
* within the allowed nodemask. The zoneref returned is a cursor that can be
979979
* used to iterate the zonelist with next_zones_zonelist by advancing it by
980980
* one before calling.
981+
*
982+
* When no eligible zone is found, zoneref->zone is NULL (zoneref itself is
983+
* never NULL). This may happen either genuinely, or due to concurrent nodemask
984+
* update due to cpuset modification.
981985
*/
982986
static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist,
983987
enum zone_type highest_zoneidx,

mm/page_alloc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3784,7 +3784,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order,
37843784
*/
37853785
ac.preferred_zoneref = first_zones_zonelist(ac.zonelist,
37863786
ac.high_zoneidx, ac.nodemask);
3787-
if (!ac.preferred_zoneref) {
3787+
if (!ac.preferred_zoneref->zone) {
37883788
page = NULL;
37893789
goto no_zone;
37903790
}

0 commit comments

Comments
 (0)